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Abstract 

Character posing is of interest in computer animation. It is difficult due to its dependence on inverse kinematics 
(IK) techniques and articulate property of human characters . To solve the IK problem, classical methods that 
rely on numerical solutions often suffer from the under-determination problem and can not guarantee naturalness. 
Existing data-driven methods address this problem by learning from motion capture data. When facing a large 
variety of poses however, these methods may not be able to capture the pose styles or be applicable in real- 
time environment. Inspired from the low-rank motion de-noising and completion model in [LYLII], we propose 
a novel model for character posing based on sparse coding. Unlike conventional approaches, our model directly 
captures the pose styles in Euclidean space to provide intuitive training error measurements and facilitate pose 
synthesis. A pose dictionary is learned in training stage and based on it natural poses are synthesized to satisfy 
users ' constraints . We compare our model with existing models for tasks of pose de-noising and completion. 
Experiments show our model obtains lower de-noising and completion error. We also provide User Interface( UI) 
examples illustrating that our model is effective for interactive character posing. 

Categories and Subject Descriptors (according to ACM CCS): 1.7 [Computer Graphics]: Computer Graphics — 
Animation 



1. Introduction 

Character posing is an important step for key-frame anima- 
tion. It is difficult for novices and even skilled artists due to 
the articulate property of human motion. With the most pre- 
vailing input device still being the mouse, the users' input 
can only provide basic information such as 2D screen coor- 
dinates. Based on this limited information, it is challenging 
to generates satisfactory character poses efficiently. More- 
over, it requires considerable information to determine the 
character's all degrees of freedom (DOF's). Given some 3D 
positional information of one or some joints, it is useful to 
re-position the rest of joints or even the whole pose if the 
information provided by users is not accurate. For example, 
a novice animator is likely to pose an unnatural character 
within a short time limit. It is then up to the algorithm to ex- 
tract useful information such as pose style from the unnatural 
character and create a new natural one. This shall carry out 
interactively for synthesizing natural poses, and the process 
is referred to as character posing. 
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To solve the character posing problem, inverse kinematics 
is often necessary to find the skeleton in angle space repre- 
sentation. The classical inverse kinematics solves an under- 
determined non-linear system to find the joint angles. One 
popular method is to exploit the gradient information-that 
is, to construct the Jacobian matrix and then solve the sys- 
tem iteratively starting from a random initial point. However, 
the mapping from 3D Euclidean space to joint angle space is 
one-to-many if the users' constraints are insufficient. For ex- 
ample, given a set of incomplete jo'ml constraints such as the 
3D positions of some joints, the solution obtained from Jaco- 
bian method will not be unique but depends on the initiation, 
not to mention that all poses resulting from the possible so- 
lutions are unlikely to be natural. One not only has to narrow 
down the solution set, but must also refine the solutions so 
that the resulting pose is natural. 

One way that may help is by learning from motion capture 
data. Even though the space of joint configuration is large, 
the desirable poses only span a much smaller space. For ex- 
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Interactive Character Posing 



Figure 1: Framework of proposed approach 



ample, human beings can make a large number poses, but the 
space of natural poses is smaller. By recording these poses in 
motion capture data and learning from them, we can provide 
heuristics for solving the IK problem. This is the recent ap- 
proach taken by researchers and is referred to as data-driven 
IK. Our approach falls in this category. 

The framework of our model is showed in figure 1. In our 
proposed model, each pose is assumed to have sparse rep- 
resentation given a pose dictionary. The pose dictionary is 
learned from motion capture data in Euclidean space. These 
data cover a large number of different motion styles, such 
as walking, running and other sports activities. In the inter- 
active character posing stage, our model can respond to the 
users' inputs and constraints in real-time and construct nat- 
ural poses that meet the users' intentions. We solve the pose 
synthesis problem by breaking the optimization problem into 
three components: 1) finding the sparse coefficient and rota- 
tion parameters; 2) normalizing the pose to determine the 
scaling parameter and 3) building the output pose in angle 
space by Jacobian method. Details on our model and how to 
solve the optimization problem are presented in section 3. 

The rest of this paper is organized as follows. We review the 
related work in the remaining of this section. Starting from 
the subspace models, we present the inspiration for this work 
and derive our model in section 2. In section 3 we introduce 
our proposed model and the algorithm for solving the model 
in detail, followed by applications and experimental results 
in section 4. We present some discussions and conclude our 
work in section 5. 

1.1. Related work 

Classical inverse kinematics The use of Jacobian matrix for 
inverse kinematics can be at least traced back to [GM85], 
in which Girard et al linearised the equation p = /(q) at 
current estimate q, yielding p = ,/(q) +y(q)"^Aq, where 



/ is the forward-kinematic function which involves a set 
of translations and rotations, usually implemented procedu- 
rally in some programming language; /(q) is the Jacobian 
matrix defined as J{q) = ■ The Jacobian matrix is 

usually not full-rank and the update Aq is given by Aq = 
Ji<l)Hp~mi where y(q)t = (y(q)y(q)^ + iil)-ly(q) 
is the pseudo-inverse of the Jacobian matrix and r| is a small 
positive number. To accommodate constraints such as angle 
limit or spatial relations , Zhao et al [ZB94] minimized the £2 
distance between input pose and forward-kinematics func- 
tion subject to the constraints using non-linear programming 
technique. Rose et al [RRP97] solved the IK problem using 
BFGS optimization method [GMW81], which is an quasi- 
Newton algorithm that does not require the complicated Hes- 
sian matrix. Their work aimed at building the final motion in 
angle representation using the sensor data, which were ob- 
tained from motion capture process in Euclidean space. 

Data-driven Inverse Kinematics In general, data-driven IK 
leverages the mocap data and models the IK problems as 
follows 

min. £pi-ior + X£iK (1) 

where -Bprior is the energy term that measures the (negative 
log) observation likelihood, isprior is the energy term that 
measures the (negative log) probability of current pose un- 
der some prior distribution. This prior distribution is what 
makes the model distinctive from other ones. 

A straightforward model for modelling the prior is to use 
the Gaussian distribution. Due to the connection in the co- 
variance matrix, this approach is related to Principal Com- 
ponent analysis (PCA) which restricts the solution to lie in 
the subspace span by the principal components. By impos- 
ing the Gaussian prior, we force the solution to approach 
the mean from the direction of one principal component or 
a linear combination of them. Instead of using the Gaussian 
model directly, we can also first partition the motion data by 
clustering algorithm and then build a Gaussian prior for each 
cluster. This is similar to the mixture of local linear models 
which have been used as baseline models in [CH05]. 

Wei et al [WCll] modelled the prior using the mixture of 
factor analyzers (MFA) [GH96]. MFA is similar to Mix- 
ture of Gaussian, but includes a dimension reduction com- 
ponent and avoids the ill-condition problem of covariance 
matrix. Kallmann et al [Kal08] introduced analytical IK for 
the arms instead of the whole body. To construct a natural 
whole-body, a set of pre-designed key body poses are used 
for pose blending (interpolation). Grochow et al [GMHP04] 
proposed style-based IK (SIK) for modelling human motion. 
Their model is based on the scaled Gaussian Processes La- 
tent variable Models (GPLVM) [Law04]. Specifically, the 
training samples and the target poses are mapped to low- 
dimensional latent variables using GPLVM. These variables 
are connected by a kernel function. The information then 
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passes from the training set through the correlation of la- 
tent variables to the target pose. Their model can general- 
ize to unseen poses thanks to the good generalization abil- 
ity of Gaussian processes. However, due to the limitation 
of Gaussian processes, the complexity of their approach is 
asymptotically cubic to the size of training set. To reduce 
complexity, they maintained an active set during the training 
and synthesis. Despite the improved efficiency brought by 
the active set, it is still prohibitive to learn from large scale 
pose data for real-time applications. To further improve ef- 
ficiency, Wu et al [WTRll] considered using different ap- 
proximations to the speed up Gaussian processes and apply 
them to solve the IK problem. Other than GPLVM and its 
variants, modelling the motion based on dimension reduc- 
tion is also very popular. Examples are those methods that 
based on state model [BHOO, LWS02] and PCA [SHP04], 
etc. 

Given incomplete measurements of motion capture sensors, 
Chai et al [CH05] construct a full-body human motion use 
Local PCA (LPGA) model. This basic idea is to incremen- 
tally estimate the current pose based on the previous esti- 
mated poses and a motion capture database. This prior term 
measures the deviation of reconstructed poses from the mo- 
tion capture database. Since the database can be large and 
heterogeneous, they introduce a LPGA model: given an in- 
complete pose, they first search the database to find the near- 
est pose and build a Gaussian motion prior around the neigh- 
bourhood. The prior is then used for pose synthesis. 

Motion data de-noising and completion was considered by 
Lou et al [LCIO]. The idea is to first construct a set of fil- 
ter bases from the motion capture data and use them for 
motion completion or de-noising. The resulting motion is 
the solution to a cost function that consists of the bases- 
representation error and the observation likelihood. The fil- 
ter bases capture spatial-temporal patterns of the human mo- 
tion, and the den-noising process also relies on the spatial- 
temporal patterns, which in our case do not exist. Another 
work on this direction was introduced by Lai et al [LYLl 1], 
in which low-rank matrix completion algorithm was used for 
unsupervised motion de-noising and completion. The major 
difference of our working enviroimient from both of these 
is that we do not have temporal information available when 
synthesizing new poses. 

Sparse coding On the other side, sparse representation has 
been widely applied to image processing and pattern recog- 
nition. Examples include face recognition [WYG*09], im- 
age super-resolution [YWHM08], etc. For modelling human 
motion, [LFAJIO] considered each joint's movement as a 
signal that admits sparse representation over a set of basis 
functions. These basis functions are learned from the motion 
capture data. They demonstrated that the proposed model is 
useful for action retrieval and classification. Our work is dif- 
ferent as we model each pose separately and our target appli- 



cation is on character posing instead of action retrieval and 
classification. 

Summary Among the existing models, the numerical IK al- 
gorithms do not have access to motion capture data, thus 
can not guarantee the naturalness of synthesized poses. For 
data-driven models, Gaussian model will introduce large er- 
ror for large dataset as it tries to approximate the underlying 
complicated distribution by Gaussian. Imposing a clustering 
step before applying Gaussian is an improvement but leaves 
us the problem of choosing the cluster number. For LPGA, 
searching the pose in on-line mode is too slow for learning 
from large training set. The model accuracy also depends on 
both the searching result and the neighbourhood size. For 
SIK, the complexity problem is prohibitive for moderate- 
scale training set, and the introduced active-set approxima- 
tion is difficult to capture the diversity of motion styles. For 
MFA, the introduction of a diagonal matrix in the covari- 
ance avoids the ill-condition problem when the cluster num- 
ber is large (and thus the pose number in each cluster is 
small). However, being a variant of Gaussian model, it is 
still at the risk of under-fitting the complicated data. Apart 
from the limitations mentioned above, most of these models 
are probabilistic and the training error is measured by likeli- 
hood, which is not intuitive: given such a measurement, it is 
not straight-forward to determine whether the model is ade- 
quate for fitting the data. Consequently, it is hard to choose 
the model parameters (e. g. , the cluster number). In con- 
trast, our model measures the training error directly by the 
mean square error, which is very intuitive for determining 
model parameters. Besides, our model can learn from large 
datasets (up to millions of poses) with an arbitrarily small 
training error (although) at the expense of the increasing the 
size of dictionary. We found that this increase does not cause 
the over-fitting problem and the de-noising and completion 
algorithm still maintains efficiency, as the complexity of our 
synthesis algorithm is linear to the size of pose dictionary . 
We present some real-time applications demonstrating that 
our model is effective for interactive character posing. We 
also compare our model with the existing models to test the 
performance of pose completion when a large proportion of 
joints are missing, and pose de-noising when the pose is cor- 
rupted by dense and sparse Gaussian noise. Experimental 
results show that our model has lower completion and de- 
noising error. 

Contributions Our contributions are two-fold: 1) starting 
from the prevailing subspace models in modelling human 
motion, we propose sparse representation of poses for char- 
acter posing; to our knowledge, we are the first to propose 
sparse coding for character posing; 2) different from previ- 
ous approaches, we propose to learn from the motion capture 
data in Euclidean space, which not only provides intuitive 
measurements in training error, but facilitate sparse coding 
and pose synthesis. 
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2. Overview of proposed model 

Notation setting In this paper, matrices, vectors and scalers 
are denoted in bold face upper-case, bold face lower-case 
and non-bold lower-case letters respectively. | ] ■ ] l2> 1 1 ■ I |o ™d 
II ■ I If denote the vector i'2-norm, vector i'o-(pseudo)norm 
and matrix Frobenius norm respectively. || • | jo measures the 
number of non-zero components in a vector. 

From low-rank approximation of motion to sparse repre- 
sentation of poses As the movements of the body parts are 
correlated, when we represent the human motion as a matrix, 
it will be approximately low-rank, endowing a fast-decaying 
spectrum [LYLl 1]. Low-rank approximation is therefore ef- 
fective for modelling human motion. Our work on this paper 
is inspired from the low-rank motion completion approach 
proposed by Lai et al [LYLl 1], in which the rank of a motion 
is minimized for completing and de-noising human motion. 
The connection between rank function and the i'o-norm min- 
imization is clear if we observe that the rank of a diagonal 
matrix is equivalent to the i'o-norm of the diagonal vector. 

Let {y,- G TZ^}i=\ „ denote a set of poses and 

Y be the motion matrix: Y = [yi,...,ym]. The Sin- 
gular Value Decomposition (SVD) of Y gives Y = 
[ui,..UD]diag([ii,...,JD])[vi,..vz5]^, where u,- € 7^^,v,■ £ 
7?.'". By setting Sj^^i ...S£, to zeros, we can approximate each 

pose y/ by the first k singular vectors: y, = ^y^jUjaj'', 

where a'/' = siy^p and v'/' is the rth component of v;. The 
number k can be small and we still have a good approxi- 
mation for human motions. Another words, each pose has 
a sparse representation given the set of bases {uj}j=i,,,,^£), 
and the supports in this case lie in the first k bases. 

Following this idea, to learn from a large motion capture 
dataset, one possible way is to first partition the whole train- 
ing set into K clusters and find a set of bases U; for cluster 
i . If we then collect all the bases into a matrix, i.e., U = 
[Ui, ...,Ua:], each pose in the training set will be sparsely 
represented under such a matrix. This approach is general, 
as we can set A" to 1 to get back to original low-rank ap- 
proximation of the whole dataset, and set K to the size of 
the dataset to use the whole dataset as bases. However, this 
leaves us the problem to find a way to properly partition the 
data into clusters. If the poses in a cluster are too linearly- 
uncorrelated because of improper partition(e.g., too many 
diversified poses in a cluster), then the sparse approxima- 
tion error will tend to be large. On the other hand, if each 
cluster only consists of a few poses , the sparse approxima- 
tion is small but the number of bases will be very large, with 
the limit being the size of training set. 

Instead of determining the matrix U in the above means, we 
take another way around: we learn the matrix from the train- 
ing set without being worried about the partitioning. To be- 
gin with, we return to one-cluster case and note that from an 
optimization perspective, the bases U = [ui, ...,uj.] obtained 



from SVD is a solution to the following optimization prob- 
lem with variables U G 7^"''* and X G 7^*'"" 

min. ||Y-UX||| (2) 

s.t. U^U = I 

As a relaxation, the orthonormal constraint on U is changed 
to unit ball constraint on its columns and the size of U is 
extended to be Z3 x n pose dictionary with n > D. However, 
each column of X shall be sparse to reflect the sparse prop- 
erty of poses. The sparsity constraint is measured by the i'o- 
norm of each column of X. We refer to the matrix U in this 
case as pose dictionary and denote it as A to be consistent 
with the sparse coding literature. We present this pose dic- 
tionary learning process in next section. 

Modelling the poses in Euclidean space In the analysis pre- 
sented above, we have no assumption on the ambient space 
of poses. Although the motion matrix is low-rank in both Eu- 
clidean space and angle space (when preprocessed properly) 
, we choose the former for sparse representation. This is dif- 
ferent from previous data-driven approaches, which directly 
model the motion in angle space. We do so for two reasons. 

One reason is to avoid the periodicity of angles, which poten- 
tially corrupts the sparse representation: given two identical 
poses and add 271 to the one pose vector while leaving the 
other the same, then the resulting two poses are still identi- 
cal, but the 2jt-shifted one is unlikely to have same sparse 
representation as the other under the same dictionary. This 
will be a problem especially in pose synthesis, due to the 
non-smoothness of i!o-norm. The same will not happen for 
poses represented in Euclidean space. Other parametriza- 
tions such as quaternion and exponential map are (also) non- 
linear, and thus inconvenient for sparse coding which in- 
volves solving a linear system. 

The other reason is that by doing so, we can directly mea- 
sure the representation error, which provides us an intuitive 
measurements in training. Specifically, we are optimizing di- 
rectly the (mean) square eiTor of the sparse representation 
of the training set without invoking the forward-kinematics 
mapping. Moreover, since the input observations such as an 
edited poses and 2D/3D coordinates are in Euclidean space, 
the optimization for pose synthesis will be more efficient be- 
cause we can defer the demand of Jacobian matrix till we 
find a pose represented by a full set of joint coordinates. And 
the need for converting the pose into angle space in the last 
stage( see figure 1) is only necessary when we want to fur- 
ther process the pose such as changing the skeleton configu- 
ration(e.g. joint angle limits, bone length). 

3. Learning sparse representation of poses for character 
posing 

The idea of modelling the poses based on sparse coding is 
similar to the subspace approaches such as PCA, except that 
the 'subspace' is generalized to the span of active atoms in 
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the pose dictionary and tiie 'bases' are no longer assumed 
to be ortiionormal or even independent. Given a set of ob- 
servations and constraints, tlie reconstructed pose shall be a 
trade-off between having a sparse supports under the pose 
dictionary and being consistent with the observations and 
constraints. 

3.1. Learning the pose dictionary 

Before applying sparse representation, we need to first de- 
termine the underlying dictionary, which should be able to 
capture the pose variations and be insensitive to global orien- 
tation and translation. Given a training set {y; G 7?.^}, , 
the pose dictionary A £ TiP^" is learned such that the poses 
in the training set are sparsely represented under this dictio- 
nary. Specifically, the learning problem is modelled as 

min. |l'?-AX||f (3) 
s.t. j|x^-jjo<K,j= l,...,m 
I|a,tll2 = 1,^= 

where X = [xi, ...,Xm],Y = [yi, ...,ym] and y; is the rth pose 
in training set, with global orientation and translation set to 
zeros since they are usually irrelevant in affecting the pose 
style. Note the similarity between problem (3) and (2). 

To solve the above learning problem, we use the K-SVD 
algorithm proposed by Aharon et al [AEB06]. K-SVD al- 
ternates between sparse coding and dictionary updating in 
every iterate. Specifically, in the sparse coding stage, the Or- 
thogonal matching pursuit (OMP) [PRK93] is used to find 
a sparse representation of the training set, while in the dic- 
tionary updating stage, the columns of the dictionary are up- 
dated sequentially by computing the singular value decom- 
position of the sparse coding residual matrix. It is reported 
that this method is better than the naive method of simply 
computing the least square solution by fixing X to update A. 
We also refer readers to [AEB06] for details. 

3.2. Pose syntliesis problem 

Now by assuming that the pose dictionary A is given, we 
propose the following model for pose reconstruction: 

(Po) min. Y||Ax-5-/(q)||? + ||P{xo(../(q))-y„}||2 

+ ||W(T-to)j|i (4) 
s.t. l|x||o<K (5) 
^ > (6) 

In the objective (4), the first term measures the difference 
between sparse representation and the forward-kinematics 
function / scaled by a positive factor s. The scale s is applied 
to all 3 dimensions in Euclidean space to maintain the skele- 
ton scaling ratio. The constraint (5) guarantees the sparsity 
of X in the solution. 
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The second term measures the sparse coding error of the in- 
put under a rigid-body rotation, t £ Ti? is the rotation pa- 
rameter. The notation x o (t) denotes a 3D rotation of the 
vector t which is concatenation of a set of 3D points, yo is 
the input pose and yo is root-shifted version of yo. P is the 
diagonal matrix and its diagonal entries are either 1 or 0, in- 
dicating whether the corresponding entry of the input pose 
is available or not. Through this introduction of P, we allow 
the input observation to be incomplete while maintaining the 
formula integrity for complete observation by setting P to be 
the identity matrix. This can conveniently model the users' 
constraints on specifying the fixed and moving (or missing) 
joints. 

The final term provides a prior constraint on the rotation pa- 
rameters, where the diagonal matrix W gives a weight for 
each of the 3 rotation parameters. Usually, the weight on the 
second rotation parameter %y (rotation around y-axis) shall 
be larger than the other two, as this rotation is usually more 
common. 

By solving this problem, we find a pose that on one hand 
stays close to the input pose subject to a similarity transform, 
and on the other hand admits a sparse representation given 
the learned dictionary. The input pose can be incomplete or 
corrupted by noise, and it can also consist of 2D point clouds 
obtained from an image, as showed in our experiments in 
next section. 

The optimization variables in the problem {Pq) are the out- 
put pose q, sparse coefficients x, rotation parameters x and 
positive scaling s. To solve the problem, we first find x and 
X alternatively: in each iterate we first fix x and find x using 
OMP algorithm, and then fix x to find x by gradient descend. 
Based on the x and x found, we then calculate the scaling s by 
an algorithm referred to as Pose Normalization, after which 
we finally determine the output pose Jacobian method. The 
framework for solving problem (Po) is showed in figure 1, 
and the optimization details are presented in the next subsec- 
tion. 

3.3. Solving tlie pose synthesis problem [Pq) 

To efficiently solve the problem (Pq), we first assume that 
given X, we can find a positive s and a q such that Ax = 
s ■ /(q) (approximately) holds. By substituting it to (4), we 
arrive at the following, 

(A) min. |lP{xo(Ax)-yo}lli + l|W(x-Xo)ll^ (7) 
s.t. |lx||o<K (8) 

We use the alternating minimization framework to solve 
problem (-Pi). More specifically, we first solve the sparse 
coding problem by fixing x, 

min. ||P{Ax-X-'oyo}||i (9) 
s.t. ||x||o<K, (10) 
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where x denotes the inverse of rigid-body rotation l. Let 



o yo, then the above problem is equivalent to solving 
(Pifl) min. |lA,,x-Vp|l2 (U) 

S. t. l|x|jo<K 



where Ap is extraction of rows of A which correspond to the 
non-zeros diagonal entries of P, and the same goes for Vp. 
The problem {P\a) is solved by OMP. 

We then find the rotation parameters t by fixing the sparse 
coefficient x. That is, we solve the following unconstrained 
sub-problem: 



(Pii) min. ||P{to(Ax) 



-yo}ll2 + llw(t-to)ll^(i2) 



Algorithm 1 Pose normalization 

Input: A pose y in Euclidean space, standard bone length 
matrix L. 

Output: A new pose y 

Divide the skeleton into five chains (see figure 2) as fol- 
lows: 



(1,2,3,4,5) 
(1,6,7,8,9) 

(1,10,11,12,13,14,15) 
(12,16,17,18,19) 



Cl 

C2 ■ 
C3 ■ 
C4 ■ 

C5 = (12,20,21,22,23) 



The gradient information of x can be used to solved this 
problem. Note again that the notation x o y denotes the op- 
eration that subsequently rotates the pose by three rotation 
angles around x,)" and z axis, let ti = (|,0,0), t2 = (0, |,0), 
t3 = (0, 0, 2 ) , then the gradient g of the above objective func- 
tion is given by 

g« =< P{xoy-yo},P{(t + t,-)oy} > +w(''' (x« - X^'' ) 

(13) 

where < , > denotes inner product. 

We alternatively solve the above two sub-problems {P\a) and 
{Plb) until convergence is reached. Once we have found the 
final sparse coefficients x and rotation parameters x, we can 
determine the joint angles denoted as vector q by solving the 
IK problem: 



(^2) min. l|Ax-.v/(q)||^ 
s. t. i- > 



(14) 



Because of the involvement of Jacobian matrix, the Hes- 
sian for problem {Pi) is difficult to find. Moreover, Jacobian 
method, gradient descend or quasi-Newton methods seem to 
be less efficient for this problem because of the unknown ar- 
bitrary scaling s. Since we already know the lengths of all 
bones in our case, we can leverage this knowledge to de- 
termine the normalized pose y = ;^Ax. This process is re- 
feiTed to as Pose Normalization and is presented in algorithm 
1 . Note the normalization scheme is not simply making the 
pose vector normalized in i'2-norm sense. 

After finding the normalized pose y, it is ready to apply Ja- 
cobian method [GM85] to find q by solving the following 
non-linear system 



/(q) 



(15) 



Convergence and complexity By breaking the pose syn- 
thesis problem [Pq] in to sub-problems (Pi) and [Pi), we 
greatly reduce the problem complexity. The assumption for 
this break-down is that the residual of the term in (4) dimin- 
ishes, which corresponds to setting yto a large value. Thus 
this assumption holds and the break-down makes sense. The 
convergence of problem (f i ) is guaranteed as in each iterate 



Initialize Ay**' = for = 1, ... 
for 7 = 1 ^ 5 do 

for j = 2 — >• length (c,) do 
'L&tk = ci[j],k- 1 =c,[j- 



:Ay(*-» + - 



,23 andy<'' 



- 1]' apply: 



\y(k)-y{k- 



1) 



Ay' 



end for 
end for 

where y'*' is the 3D coordinate of joint k. Joint A: — 1 is the 
parent of the joint k, as already arranged so in the chain. 
L/t.jt-i is the standard bone length between joint k and 
k— I. y'*' is the new position of joint k. 




Figure 2: Skeleton configuration. Each number is a joint 
ID labelling the corresponding joint. The skeleton can be di- 
vided into five chains: the torso, both arms and both legs. 
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both OMP and gradient descend decrease the objective value 
and the objective (7) is bounded below. The convergence of 
(^2) is also guaranteed as the pose normalization algorithm 
is deterministic with constant complexity and the Jacobian 
method with complete and normalized target pose usually 
converge within 20 iterates in our experiments. 

4. Applications and Experiments 

4.1. Experimental setting 

The training samples we used are obtained from CMU mo- 
tion capture website^. We manually trim the toes and fin- 
gers from the pose data. These data are originally in 62 di- 
mensional angle space, and they are trimmed to 46 dimen- 
sional so that the resulting skeletons contain only significant 
DOF's, as in [WCll]. When converted to Euclidean space, 
the skeleton model has 23 joints with each in three dimen- 
sional . The total dimension for a pose is D = 69. We pre- 
shift all the training samples to be rooted at the origin and 
set the global orientation to zeros. The corresponds to set- 
ting the first six components in the angle vector to zeros. 

We determine the size of pose dictionary n by the follow- 
ing procedure: given a target learning error e,, we randomly 
sample «o pose for the pose dictionary and use it to test the 
sparse coding error e., of the training set. If e., < Sgf, set n 
to «o and use the current sampled poses as initialization for 
dictionary learning algorithm; otherwise, set no to 1 .5wo and 
continue the searching. We use < het as a criterion with 8 
usually set to 2 because the dictionary learning can usually 
decrease the error by 50%, as we found in our experiments. 

4.2. Large-scale Comparison 

In large-scale comparison, we use the whole database from 
CMU, which sums up to 4150384 poses, covering a varsity 
of motion styles ranging from basic types such as walking, 
running to more complex types such as basketball and golf. 
We randomly sample 50% of poses for training all models 
and the rest for testing. In training, we set the cardinality 
upper-bound K to 3. The resulting pose dictionary consists 
of 262759 atoms. 

We test our model and other models using the testing set 
which consists of 2075280 poses. These models are MFA, 
the model with a Gaussian prior, the model with clustering 
and then build a Gaussian prior for each cluster (CG), LPCA 
and PCA. The testing scheme is as follows. 

As we mentioned, the existing models can be generalized 
to (1) (except for PCA which will be discussed soon), in 
which the parameter X is important in determining the re- 
sulting pose, and its value provides a trade-off between the 
prior and the likelihood. If the noise level is high, the prior 
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Task 


Our model 


MFA 


Gaussian 


LPCA 


CG 


PCA 


dense noise 


0.12 


0.26 


0.25 


0.20 


0.95 


3.34 


Sparse noise 


0.05 


0.15 


0.14 


1.02 


0.17 


1.36 


completion 


0.01 


0.24 


0.30 


0.80 


0.20 


4.11 



Table 1: Average MSE of Large-scale comparison for de- 
noising and completion. The total testing poses for all the 
tasks are 2075280. As we see, our model performs better 
than other models for all the three tasks. PCA obtains con- 
siderably large errors for all tasks because the principal 
Components can not explain such a large dataset. 



Subject 


No. Frames 


Our model 


MFA 


Gaussian 


LPCA 


CG 


SIK 


PCA 


07 


2161 


0.02 


0.13 


0.08 


0.07 


0.02 


0.70 


0.07 


09 


769 


0.03 


0.08 


0.10 


0.06 


0.06 


1. 26 


0.09 


63 


7529 


0.07 


0.41 


2.44 


0.35 


0.34 


3. 90 


0.41 


102 


4252 


0.10 


1.39 


0.17 


0.08 


Oil 


1.37 


1.28 



Table 2: Average MSE of small-scale comparison for de- 
noising when the noise is dense standard Gaussian noise. 
For subject 07, which contains similar styles of walking mo- 
tions, all the models perform very well. However, when the 
motions become more complex, for example, subject 63, our 
model outperforms all other models. 



should be trusted more than the likelihood, thus X should 
be decreased. The same goes for K in our model. Therefore, 
the value of X for all other models and K for our synthesis 
model are chosen using brute-force search for a fair compar- 
ison. Specifically, to find the approximately best value, we 
first randomly sample 0.1% poses (about 2000) poses from 
the training set and use them to select the best value within 
a proper interval. Then the best value for each model is used 
for testing the whole testing set. 

For MFA, we use the same setting as stated in the paper 
[WCll], except for the A., which was not given originally 
and is found by brute-force search, we set the cluster num- 
ber to 20 for CG, neighbourhood size to 100 for LPCA. For 
PCA, we use the first several principal components such that 
90% energy of the corresponding eigenvalues is preserved. 
Since the SIK is too computation-demanding, we have omit- 
ted it from this large-scaled comparison. 

We test the performance of de-noising and pose completion. 
For de-noising, we test two types of noise: dense and sparse 
Gaussian noise. For dense Gaussian noise, we generate stan- 
dard Gaussian noise and add to the testing set. For sparse 
Gaussian noise, we generate standard Gaussian noise and 
randomly corrupt 20% of the joints. The mean square error 
(MSE) for each recovered pose is calculated and the aver- 
age MSE for the whole testing set is showed in figure 1 . We 
also test the performance the completion when only a small 
portion of joints are observed. Specially, the inputs are the 
3D coordinates of joint ID 16, 20, 19, 23, 5 and 9 (see fig- 
ure 2 for joint ID map). This missing pattern is the same as 
that in [WCl 1]. The comparison result is showed in 1. For a 
visual instance of the large-scale comprison, see figure 3. 
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I Subject I No. Frames | Our model | MFA | Gaussian | LPCA | CG | SIK | PCA 

07 ' 2161 ~ 0.03 0.08 ~ 0.05 ~ 0.07 0.19 0.26 0.08 

09 769 OM 005 006 OM O05 0.70 0.05 

63 7529 O03 0T2 047 O09 OTO 0.99 0.36 

102 4252 O 05 53)8 OW 007 OM 1.34 1.09 

Table 3: Average MSB of small-scale comparison for de- 
noising when the noise is sparse standard Gaussian noise. In 
this test, MFA, LPCA and our model are all robust to change 
of dataset size and motion styles, and our model performs 
slightly better among the tree. 



Subject 


No. Frames 


Our model 


MFA 


Gaussian 


LPCA 


CG 


SIK 


PCA 


07 


2161 


0.07 


on 


023 


0.25 


0.95 


0.31 


0.07 


09 


769 


0.09 


0.09 


0.26 


0.26 


0.25 


0.74 


0.05 


63 


7529 


0.07 


ai8 


0.26 


OlS 


021 


3. 90 


053 


102 


4252 


0.09 


0.24 


0.28 


0.25 


0.27 


1.48 


1.30 



Table 4: Average MSB of small-scale comparison for com- 
pletion. In this test, our model is the only one that has ob- 
tained MSB smaller than 0. 1 for all four subjects. 



Even though all models considered in this paper do not take 
into account the motion dynamics in training stage, we can 
still compare their performance on motion completion by ap- 
plying completion algorithm to each (incomplete) pose in 
the motion, as this provides a good reflection on the perfor- 
mance of pose complete. This comparison is done to a run- 
ning motion and the result is showed in figure 6. Our model 
outperforms other models in that it preserves a better pose 
structure of the upper-body (see the figure caption for more 
details). 

4.3. Small-scale Comparison 

Similar to the above large scale comparison, we also test the 
performance of each model for learning from small datasets. 




Figure 3: A visual snapshot of the large-scale comparison. 
From left to right: de-noising result when the noise is dense, 
sparse and completion result when only five joints (ID: 16, 
20, 19, 23, 5 and 9) are observed, green: ground-truth, red: 
corrupted, cyan: our model, black: MFA, magenta: Gaus- 
sian, yellow: LPCA, blue: CG, gray: PCA 



We choose four subjects from CMU mocap database web- 
site: 07 (walking), 09 (running), 63 (golf) and 102 (bas- 
ketball). These subjects are representative as they are dif- 
ferent styles of motion and their size varies from 1538 to 
15079 poses. For each subject, we randomly sample 50% 
for training and the rest for testing. The training and testing 
schemes are the same as that in large scale comparison. In 
training stage, the setting for SIK is the same as mentioned 
in [GMHP04]; for other models, the setting is similar to the 
above. We also test the performance of completion and de- 
noising under two types of noises. The results for these three 
tasks are showed the table 2, 3 and 4 respectively. As we see, 
our model outperforms the other models for three tasks even 
for small datasets. 

4.4. Interactive character posing 

We provide an real-time application of our model in inter- 
active character posing. The user interface is implemented 
in C++ and we use the pose dictionary learned in the large- 
scale comparison for pose synthesis. We consider two kinds 
of input here. Free-dragging interface provides a freely- 
edited complete pose as an input. Pose completion takes a 
set of 2D or 3D points as inputs and reconstructs the whole 
pose. Other inputs are possible, as long as they can fit into 
the model, perhaps after some necessary preprocessing . 

Free-dragging A common scenario is that when the user 
drags one or multiple joints of the skeleton, the computer 
is required to respond to this drag and create a new pose. 
Chances are that the edited pose looks like being corrupted 
by noise if the user is novice. To synthesize a new pose based 
on the corrupted pose and the pose dictionary , [Pq) is solved 
with P set to the identity matrix. This corresponds to setting 
all joints of the input pose yo as soft-constraints. 

As our model is trained from the pose data in Euclidean 
space, it is well-suited for interactive character posing in 
which the user can arbitrarily modify the pose without be- 
ing worried about bone-length constraints and angle limits. 
This provides a great continence for the user because s/he 
can now move the joints to wherever s/he wants. After the 
user finishes the modification, our model synthesizes natu- 
ral poses that satisfies the user's intention on the style and 
corrects all the violations. See figure 4 for examples. 

Pose completion In pose completion problem, we infer the 
whole pose while only a portion of the joints are observed. 
It turns out that our model can be conveniently adapted to 
solving this problem even when the model is trained with 
the full-body pose data. To do this, we simply set the en- 
tries of P corresponding to the joints that we want to set as 
observed (fixed) to ones and the rest to zeros. With this in- 
troduction of P, we can conveniently incorporate 2D inputs: 
given a picture which contains a pose, the user can label the 
joints with the 2D coordinate of the pose in the picture and 
reconstruct a 3D pose. We give two examples in figure 5. 
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Figure 4: Some examples of free-dragging and pose recon- 
struction results. First and third: modified poses; Second and 
fourth: the corresponding reconstructed poses 




Figure 5: Reconstructing 3D poses given 2D inputs showed 
as red dots. The 2D coordinates of the inputs are assigned 
to the corresponding joints(chosen by users) for pose recon- 



5. Discussions and conclusions 

De-nosing and completion We have compared our model 
with the existing ones for the performance of de-noising and 
completion. One may think that de-noising is irrelevant as 
the motion capture data are usually 'clean'. This perhaps 
is true for the already-available databases. In the process 
of motion capture however, de-noising is necessary because 
of measurement error and sensor failures [RRP97, LCIO]. 
Moreover, for interactive character posing, the pose edited 
by users is usually noisy in the sense that it is inconsistent 
with the training set. We can see the process of pose editing 
as a measurement of users' intentions, which will always 
introduce measurement noise. Apart from dense noise, we 
have also considered sparse noise. This is meaningful be- 
cause in motion capture process, noise can be sparse due to 
error introduced in a few sensors. Similarly, in character pos- 
ing, the user may only edit a few joints, making the measure- 
ment noise sparse. Pose completion also is useful, not only 
in dealing with motion capture data when the measurements 
are incomplete, but also in character posing to account for 
users' constraints. 

Sparsity The choice of K in pose synthesis stage depends 
on the noise level. It reflects our initial knowledge on the 
property of noise. If we believe that the noise level is high, 
we can reduce the k; otherwise, we can increase K such that 
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Figure 6: Comparison of all models for completing the first 
128 poses of a running sequence in subject 09 of CMU 
database. All models are trained as in the large-scale com- 
parison. For a clear view, the poses are showed at every 20 
frames. The ground-truth is showed in green together with 
the results obtained from all models (in blue) which are re- 
spectively from top to bottom: our model, MFA, Gaussian, 
CG, LPGA and PGA. Our model performs best as it pre- 
serves a better pose structure. For the rest models, Gaus- 
sian, GG and MFA fail to capture the running style of the 
upper-body (from back-bone to head) and defects can also 
be spotted in the head and feet. Neither LPGA nor PGA ob- 
tains an acceptable result in this comparison. 
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it can better approximate the input. This then offers a trade- 
off between stratifying the users' exact constraints (which 
may result in an invalid pose) and synthesizing a realistic 
and natural pose. 

Combining dictionaries Our synthesis model is flexible in 
that it can combine dictionaries that are learned separately 
by simply Concatenating all sub-dictionaries.. This provides 
a friendly solution for accepting large scale training set in 
the training stage. In terms of complexity, K-SVD is O(ra^), 
where m is the size of training set. By using fast algorithms 
for clustering such as K-means, we can divide the training 
set into smaller subsets with complexity 0{m) before apply- 
ing K-SVD to each of them. In this way, the overall training 
complexity is reduced. Since K-SVD is a generalization of 
k-means, this approximation is analogical to the hierarchical 
version of k-means. We use this approach for the large scale 
training. 

Physical constraints The only physical constraints we con- 
sider in this paper is the bone length. Angle limits are not 
considered here as we found that the solving the problem 
(Pq) usually will not violate the angle limit constraints. How- 
ever, they can be interoperated into sub-problem (P2) if nec- 
essary, and the problem can be solved for example similarly 
to [ZB94]. 

Connection to subspace models By setting k to a large 
number and imposing extra orthogonal constraints on A, the 
pose dictionary is the basis matrix obtained from subspace 
models and the pose synthesis problem (Pq) is almost the 
same as the PCA model which optimizes the pose in PCA 
subspace(except that we model the pose in Euclidean space). 
From this point of view, our model can be seen as a general- 
ization of the PCA subspace model. 

Connection to compressed sensing In compressed sens- 
ing [Don06], the random sensing matrix plays an important 
role. Our model is related to compressed sensing except that 
we set the random sensing matrix to be square. Then this 
sensing matrix will have no effect as we can take inverse 
and remove it from the model. That is. We do not perform 
any reduced measurement on the pose. This is because the 
input pose might be incomplete already, as indicated by P. 
Introducing a (fat) sensing matrix will complicate the (in- 
complete) measurement and make it more difficult to recover 
the pose. 

Connection to nearest-neighbour Although similar in 
some sense, our approach is not nearest-neighbour (NN) al- 
gorithm. Firstly, our model is a parametric model, while the 
NN algorithm is not. Secondly, although the OMP algorithm 
used for sparse coding stage is a greedy algorithm that re- 
sembles NN, it is in fact a greedy algorithm for the solving 
the sparse coding problem. Other algorithms such as linear 
programming , shrinkage and interior point method can also 
be used. However, we find that OMP is more efficient in our 
case. 



Conclusion In this paper, we have proposed a model for 
articulate character posing. We have shown that our model 
can be trained to learn the pose dictionary from a large-scale 
training set. We also demonstrated how to apply our model in 
de-noising and completion problem. We have also provided 
UI examples showing how to use our model for character 
posing. Experiments have shown that our model outperforms 
the existing models in pose de-noising and completion. 

One limitation of our model is that to achieve a small learn- 
ing error, the pose dictionary size could be large for learn- 
ing from a large dataset. This could be a problem for appli- 
cations in devices that have limited memory. Nevertheless, 
our model is currently designed for applications in personal 
computers. 
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